MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions
نویسندگان
چکیده
High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over high-dimensional data space which is inferior to a brute-force approach linear scan. In this paper, we focus on an approximate nearest neighbor search for two different types of queries: r-Range search and k-NN search. Adapting a novel concept of a ring structure, we define a new index structure MLR-Index (Multi-Layer Ring-based Index) in a metric space and propose time and space efficient algorithms with high accuracy. Evaluations through comprehensive experiments comparing with the bestknown high-dimensional indexing method LSH show that our approach is faster for a similar accuracy, and shows higher accuracy for a similar response time than LSH.
منابع مشابه
PP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search
We present the Permutation Prefix Index (PP-Index), an index data structure that allows to perform efficient approximate similarity search. The PP-Index belongs to the family of the permutationbased indexes, which are based on representing any indexed object with “its view of the surrounding world”, i.e., a list of the elements of a set of reference objects sorted by their distance order with r...
متن کاملFast Nearest Neighbor Search in High-Dimensional Space
Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor se...
متن کاملFast Nearest-Neighbor Search Algorithms Based on High-Multidimensional Data
Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore pre-compute the result of any nearest-neighbor s...
متن کاملThe Hybrid Tree: An Index Structure for High Dimensional Feature Spaces
Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing...
متن کاملQSAR studying of oxidation behavior of Benzoxazines as an important pharmaceutical property
In this work the electrooxidation half-wave potentials of some Benzoxazines were predicted from their structural molecular descriptors by using quantitative structure-property relationship (QSAR) approaches. The dataset consist the half-wave potential of 40 benzoxazine derivatives which were obtained by DC-polarography. Descriptors which were selected by stepwise multiple selection procedure ar...
متن کامل